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ABSTRACT 

The standard method of modelling axisymmetric stellar systems begins 
from the assumption that mass follows light. The gravitational potential is 
then derived from the luminosity distribution, and the unique two-integral 
distribution function f{E, L^) that generates the stellar density in this potential 
is found. It is shown that the gravitational potential can instead be generated 
directly from the velocity data in a two-integral galaxy, thus allowing one to drop 
the assumption that mass follows light. The two-dimensional rotational velocity 
field can also be recovered in a model-independent way. Regularized algorithms 
for carrying out the inversions are presented and tested by application to 
pseudo-data from a family of oblate models. 



INTRODUCTION 



Modelling of elliptical galaxies has a long history, extending back to a time when it 
was taken for granted that elliptical galaxies were axisymmetric and that all of their mass 
could be accounted for in stars. We now know that dark matter is a commmon component 
of galaxies, both at large (Ashman 1992) and small (Kormendy & Richstone 1995) radii, 
and that triaxial configurations are possible ones for stellar systems (Schwarzschild 1979). 

One goal of modern dynamical studies is accordingly to verify the presence of dark 
matter; another is to detect departures from axisymmetry. Because triaxial models are 
difficult to construct, one often begins by looking for simpler models that are consistent 
with the data. This "model-building" approach is popular since it requires a minimum of 
thought about the information content of the data: one simply builds a model and checks 
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whether it reproduces the observations. But one can take a more sophisticated approach, 
and ask whether the data imply anything definite about the observed stellar system. For 
instance, one might attempt to falsify the axisymmetric hypothesis in a model-independent 
way. This "inverse problem" approach is more difficult but also more rewarding, since it 
leads to more secure conclusions about the dynamical state of the galaxy. 

Beginning with Binncy, Davies & Illingworth's (1990) pioneering study, a standard 
scheme has been developed for modelling elliptical galaxies. One assumes that the observed 
galaxy is axisymmetric and that the distribution of mass is known - for instance, mass 
might be assumed to follow light. The gravitational potential is then derived from this mass 
distribution using Poisson's equation; the unique two-integral distribution function f{E, L^) 
that reproduces the stellar density in this potential (or more exactly, the even part of /) 
can also be found (Lynden-Bell 1962; Hunter 1975). The odd part of /, which determines 
the rotational velocity field, is usually represented via some ad hoc parametrization. One 
then projects this derived / back into observable space to find the predicted kinematical 
variables and compares them with the observations. If there is agreement, one can claim to 
have found an axisymmetric model that is consistent with the data. 

This technique has been used to reproduce the kinematical data for a few, well-observed 
elliptical galaxies (van der Marel et al. 1994; Dehnen 1995; Qian et al. 1995). But it is 
difficult to state precisely what has been learned from studies like these. Are the observed 
galaxies actually characterized by a constant M/L, axisymmetry, and a two-integral 
distribution function, as assumed? Or might there exist models with spatially-varying 
M/L's or three-integral /'s that reproduce the data equally well? And if no model consistent 
with the data can be found, which of the various assumptions made in the model-building 
has been violated? 

A useful comparison can be made here with spherical systems (§2.1). If one assumes 
that the gravitational potential $(r) of a spherical galaxy is known, then the observed 
velocity dispersion profile can be used to infer the unique dependence of anisotropy on 
radius (Binney & Mamon 1982). Alternatively, one can assume that the distribution 
of stellar velocities is everywhere isotropic, in which case the velocity dispersion profile 
implies a unique form for $(r) (Merritt & Gebhardt 1994). Assuming both isotropy and 
a constant M/L would clearly be an over- determination of the problem, since the two 
assumptions together imply a unique velocity dispersion profile, and this predicted profile 
would almost certainly be different from the observed profile - indeed, the kinematical data 
would not have been used at all in the construction of the model. But this is precisely 
what is commonly done in modelling axisymmetric galaxies: one assumes both isotropy in 
the meridional plane (i.e. / = f{E, L^)) as well as an ad hoc form for the potential, and 
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these two assumptions together imply a unique dependence of the mean square velocity 
on position over the image of the galaxy. It is unlikely that a model constructed in this 
way would reproduce the two-dimensional velocity dispersion data; and if it did not, there 
would be no clear indication of which assumption had been violated. 

It is clearly desirable to think more carefully about the information content of 

kincmatical data in axisymmctric stellar systems. Unlike the spherical case, the functions 
to be derived depend on two spatial variables, but the kincmatical data are likewise 
two-dimensional, depending on both projected radius and position angle. Many such data 
sets now exist, for stars in globular clusters (Mcylan et al. 1995) and in galaxies (Bacon et 
al. 1994); emission- line objects around the Galactic bulge (te Linkel Hekkert et al. 1991) 
and around other galaxies (Hui et al. 1995); and galaxies in clusters (CoUess & Dunn 
1996). These data are most often in the form of discrete velocities, and in the best-studied 
systems, the measured velocities number in the hundreds or even thousands. 

Here it is shown (§2.2) that the availability of two-dimensional kincmatical data allows 
one to approach the dynamical study of axisymmetric stellar systems as an inverse problem 
rather than as a model-building problem. Prom the single Ansatz f — f{E, L^), one can 
infer the unique gravitational potential ^{zu,z) that is consistent with the observed first 
and second velocity moments of /. The rotational velocity field can likewise be obtained 
in a model-independent way. By using the kincmatical data in the construction of the 
model, rather than assuming that mass follows light, one thus arrives at the unique pair 
of functions {/, $} that are consistent with the two-integral hypothesis. If this unique 
model can be falsified - for instance, by using hne-of-sight velocity distributions, proper 
motions, or some other data - then the two-integral, axisymmetric hypothesis will have 
been convincingly ruled out. In the model-building approach, on the other hand, one can 
not discard the two-integral assumption until one has explored an effectively infinite number 
of possible forms for the potential (§2.3). 

The numerical techniques that are required for converting the velocity data into a map 
of the potential are rather more subtle than the ones that have been applied up to now in 
the study of axisymmetric galaxies. These techniques are accordingly described in some 
detail (§3.1) before applying them to pseudo-data generated from a family of oblate models 
(§3.2, 3.3). 

Although this approach yields the unique potential consistent with the two-integral 
assumption for /, the true potential might be different if / depends on a third integral. 

However it would be premature to investigate three-integral models until one had first 
used the algorithms described here, or equivalent ones, to rule out a two-integral model. 
Having done so, one could then search for the pair of functions {^{zu,z), f^E^L^^I^)} 
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that are most consistent with the data. This is a hard problem, and even in the spherical 
geometry there is no published algorithm that can extract $(r) and f{E, LP') in a completely 
model-independent way from the data; the most sophisticated such algorithms are still 
based on parametrized forms for the potential (Saha & Merritt 1993; Merritt 1993b). 

The focus here is on situations where the velocities are measured discretely, as would 
be the case for stars in a globular cluster. Most large, kinematical data sets are of this form. 

It is also assumed throughout that the observer lies in the equatorial plane of the 
observed system. The reason for this unpleasant assumption is that deprojection of the 
luminosity density becomes underdetermined if the inclination angle is less than 7r/2, even 
if this angle is assumed known (Rybicki 1986; Gerhard & Binney 1996). One can therefore 
probably not hope to uniquely infer the dynamical state of an axisymmetric galaxy that is 
not viewed edge-on. This fact is routinely ignored in the model-building studies but must 
be faced squarely if one wishes to solve the inverse problem. 

Also presented here (§3.4) is the first regularized algorithm capable of deriving f{E, L^) 
for an axisymmetric system from its density and potential (the Lynden-Bell - Hunter 
problem) given imperfect or incomplete information about those functions. 



Our starting point for the axisymmetric inverse problem is the simpler spherical 
problem. A spherical stellar system has a distribution function that depends on the orbital 
energy E and angular momentum L (both defined here per unit mass). Spherical symmetry 
in velocity space is assumed as well; then / = f{r,v) — f{E) — f[v'^/2 + $(r)], and the 
radial and azimuthal velocity dispersions are everywhere equal, a"r(r) = Ct(r) = cr{r). 

Having measured a set of discrete positions and velocities, one can compute smooth 
approximations to the surface density profile S(i?) and the line-of-sight velocity dispersion 
profile (Tp{R). The function (7(r) that defines the intrinsic velocity dispersion is then related 
to the observed velocity dispersions via 



2. 



THE INVERSE PROBLEM 



2.1. Spherical Systems 




(1) 



which when inverted gives 
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Here z/(r) is the spatial density of the kinematical sample, obtained by deprojecting the 
surface density S(i?): 

1 /•°° c/S dR 



The mass within r and the potential follow from Jeans's equation in spherical symmetry: 

GM{r) = r — = -ra r — + — , 4 

dr \d\ogr ct log r J 

and the mass density is 

^ ^ _ dM 
Airr'^ dr 

Finally, the unique isotropic distribution function describing the kinematical sample is given 
by Eddington's formula: 

..px _ _J: d_ dv d^ 

' VS-K^dEJE rf$v/$^' ^ ' 

Thus both / and $ follow uniquely from the observed surface density and velocity dispersion 
profiles, under the sole assumption of spherical symmetry in velocity space. Q Here we are 
treating the stars as if they were ions in an X-ray emitting gas; the quantity cr^(r) plays the 
role of kT{r)/m in the gas, and no assumption has been made that mass follows light. 

In spite of its limitations, this approach has the following things to recommend it. 

1. One begins from a statement of the problem that is fully determined: there 
is a unique pair of functions {/(i?),$(r)} that reproduce any set of observed profiles 

ap{R)}. In practice one does not measure or ap{R) precisely, but nonparametric 
algorithms can be defined which yield estimates {/, that are suitably close to the true 
functions (Merritt & Gebhardt 1994). 

2. The dependence of $ on r is determined from the velocity data, as it should 
be, rather than being derived from the luminosity density under the always- questionable 
assumption that mass follows light. 

3. The validity of the single assumption - that / is spherically symmetry in velocity 
space - can be checked. One simply projects the derived model to find the predicted, joint 
distribution of positions and line-of-sight velocities: 

aAV) dr^ r^Mr)-vl 



NiR, Vp) = . J^^ ^=== f [vV2 + v;/2 + $(r)J dv'\ (7) 



^Strictly speaking, ar — at does not imply / = f{E)] one can construct "isotropic" stellar systems with 
distribution functions that depend on as well. 
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If this function is consistent with the observed hne-of-sight velocity distribution at every 
radius, then the model is fully consistent with the data (assuming that no other relevant 
data exist); if not, isotropy can be ruled out, independent of any preconceptions about the 
radial variation of M/L. 

Often the kinematical data will be insufficient in quality or quantity to produce 
a statistically significant discrepancy with the computed N{R,Vp), because anisotropy 
generates only slight changes in the form of the line profiles and very high quality data 
are required to detect these changes (Gerhard 1993; Merritt 1993b). Nevertheless this 
approach makes manifest a point that is often misunderstood: one can not, in principle, 
make inferences about the degree of velocity anisotropy in a spherical system from the 
velocity dispersion data alone, since one can always construct the potential in such a way 
as to reproduce any observed ap{R). 

4. For some stellar systems - e.g. globular clusters - the assumption of isotropy in 
velocity space can be physically motivated. 



Our goal is to identify a similar inverse problem for axisymmetric stellar systems. We 
wish to show that complete knowledge of the line-of-sight velocity moments Vp and on the 
plane of the sky, and of the surface density S of the kinematical population, is equivalent 
to knowledge of the two-integral distribution function f{E,Lz) describing that population 
and of the potential ^{zu,z) in which it moves - independent of any assumptions about the 
relative distribution of mass and light. As discussed above, it is assumed throughout that 
the observer lies in the equatorial plane of the observed system. 

The Jeans equations that relate the potential ^{zu,z) of an axisymmetric system to 
gradients in the velocity dispersions are 



Here {zu, z, (f)) are the usual cylindrical coordinates; all quantities are assumed independent 



Now set / = f{E,Lz). The velocity dispersions are then isotropic in the meridional 



2.2. Axisymmetric Systems 




(8) 




of 0. 
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plane, a = cr^ 



(Jz, and the Jeans equations reduce to 
9$ d{iya^) 



V 



dz 
dw 



dz ' 
dw 



V 



w 



a 



(10) 
(11) 



Define Cartesian coordinates (X, Z) on the plane of the sky, with Z parallel to the 
symmetry axis and Y along the line of sight. We assume that v{w, z) has been derived 
from S(X, Z) via the usual inversion, unique for an edge-on galaxy (Rybicki 1986). ^{vj, z) 
is still unknown. But if we multiply the first Jeans equation by d/dw v^^ and the second 
by d/dz and equate, we find a relation between a^, and cr that is independent of $: 



du da"^ du da'^ ^ ^ f 2 2 



0. 



dw dz dz dw ' w dz 
This relation may be seen as specifying a unique given a and U^, i.e. 

du da"^ du da"^ 



^4>{-^,z) + a^{w,z) =vl{w,z) 



-w 



V [W,Z) 



dw dz dz dw 



(12) 



dz + a'^{w,z). (13) 



Now consider the rotational velocity field. The mean line-of-sight velocity Vp{X, Z) is 
related to the internal streaming velocity v^{w, z) via the projection integral 



S(X, Z)vp{X, Z) = 2X u{w, Z)v^{w, Z) 

J X 



dw 



w^ 



X2 



(Fillmore 1986, Eq. 12). Equation (|T^ has inversion 

I d 



v[w, z)v^[w, z) 



-w- 



TT Jm dX 



X-^^{X,z)vp{X, z) 



dX 



- W^ 



(14) 



(15) 



Note that is completely determined by Vp independent of the two-integral assumption - 
all that is required here is the assumption of axisymmetry. The possibility of computing the 
rotational velocity field via direct inversion of the data has been pointed out (though never 
implemented) by a number of authors including Meylan & Mayor (1986) and Merrifield 
(1991). 



The mean square line-of-sight velocity is 

X2 



T.{X,Z)vl{X,Z) = 2 v{w,Z) 



X 



I- \a\w,Z) + —vl{w,Z) 



w^ 



w^ 



wdw 



w^ 



X2 



(16) 



(Fillmore 1986, Eq. 16). The integrand on the right hand side contains the two unknown 
functions cr^ and = v'l + al, and at first sight does not admit of a unique inversion. But 



- 8 - 



we have a relation between cr^ and v"^, equation (|T3|). We can therefore replace v"^ in the 
integrand by a linear expression that depends only on and on u. Although no analytic 
expression for the inverse relation could be found by the author, it will be shown below that 
a'^{zj, z) is numerically well-determined by this equation. Having derived cr^(-a7, z), v'^{zu, z) 
then follows from equation ([13D , and cf"^ = v"^ — v'^^. 

It has thus been shown that ^(^(cc,^), a^{w^z) and a{w^z) are determined by the 
two-dimensional velocity moment data in an edge-on, two-integral system without any 
assumptions aside from isotropy in the meridional plane. It follows that the potential 
(^{w,z) and the mass density p{'UJ,z) are also known: the former from equations (10) or 
(11), the latter from Poisson's equation: 



p{w,z) 



AtxG 



— 



w dw \ dw I dz"^ 



(17) 



So derived, p will of course not necessarily be proportional to v. 



Finally, there is a unique distribution function f{E,Lz) describing the kinematical 
population that reproduces u and in the derived potential. As usual, we divide / into 
odd and even parts with respect to L^: 

f{E, L,) = U{E, L,) + f4E, L,), U{E, L,) = ^ [f{E, L,) ± f{E, -L,)] . (18) 

The number density is related to /+ via 

4:71 fO fmy/2(E-<S>) 

u{zu,z) = — dE U{E,L,)dL,, (19) 



tU J# JO 

and the rotational velocity field is related to /_ via 

u{w,z)v^{w,z) = — dE f_{E,L,)LJL,. (20) 

zu Jo 

These equations imply unique functions /+ and /_ given z/, v^f, and $ (Lynden-Bell 1962; 
Hunter 1975; Dejonghe 1986), although the inverse relations are complex. Once again, 
since mass is not assumed proportional to light, the population described by this / will not 
necessarily have the same spatial distribution as the matter that determines the potential. 

Practical algorithms for carrying out the inversions, suitable for real (i.e. noisy and 
incomplete) data, are described below. 

Just as in the spherical case, the validity of the assumed relation between 0"^ and 
can be verified by computing the full distribution of line-of-sight velocities N{X, Z, Vp) from 
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the model and comparing with the observed velocity distributions; or by carrying out the 
same procedure with the predicted and observed proper motions; etc. If these predicted 
and observed distributions are significantly different, one can conclude that the assumption 
of isotropy in the meridional plane is not valid (or that the system is not being viewed 
edge-on, or is not axisymmetric) and proceed to investigate more general models. 

Note that - just as in the spherical case - one can not infer the presence of velocity 
anisotropy (cr^ 7^ a^) based on inspection of the velocity dispersions alone, since the 
potential can always be constructed so as to reproduce the observed dispersions without 
a third integral. Information about the dependence of / on a third integral is contained 
only in the higher-order moments of the line-of-sight velocity distribution (or the proper 
motions); and conversely, there is no need to use that extra information unless one intends 
to investigate three-integral models - the two-integral problem is fully constrained by the 
the first and second velocity moments of /. 

2.3. Comparison with Other Methods 

It is interesting to compare this approach to the one pioneered by Binney, Davies & 
lUingworth (1990); their technique has recently formed the basis of a number of data-based 
studies by van der Marel and collaborators (van der Marel, Binney & Davies 1990; van der 
Marel 1991; van der Marel et al. 1994). Similar approaches to the axisymmetric inverse 
problem have been worked out by Mcrrifield (1991), Dejonghe (1993), Emsellem, Monnet & 
Bacon (1994), Dehnen (1995), Kuijken (1995), and Qian et al. (1995). All of these authors 
assume at the outset that mass follows light (or that the mass distribution has some other 
ad hoc form, often including a central point mass), and derive the two-dimensional potential 
from the assumed mass distribution using Poisson's equation. The depth of the potential, 
or equivalently the mass-to-light ratio, is obtained from the virial theorem. The observed 
velocities are not otherwise used in the construction of the model, except insofar as they 
constrain the form of the odd part of /, which is typically represented via some simple 
parametrization. 

The advantage of this method is its simphcity: once the potential is specified, the 
even part of the distribution function /+(£■, L^) or any of its moments can be obtained 
(Lynden-Bell 1962; Hunter 1975; Dejonghe 1986). But this simphcity is achieved at the cost 
of having to assume the forms of ^{zu, z) and /_ at the outset. It was shown above that 
the potential follows uniquely from the observed Vp{X, Z) and v^{X, Z) in a two-integral, 
edge-on system - one is not free to assume arbitrary functional forms for $. Furthermore 
the odd part of / can be derived in a model-independent way from the observed rotational 
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velocity field. It follows that the approach taken by these authors is overdetermined. 
Models constructed in an assTimcd potential are unlikely to reproduce a two-dimensional 
set of velocity dispersions, since the information contained in those data was not used. And 
if the model should fail to reproduce the kinematical data, the failure could be due to the 
breakdown of either of two assumptions: that mass follows light, or that / depends on only 
two integrals. In the method described here, on the other hand, the potential is constructed 
to have the unique form that is consistent with the two-integral assumption for / and 
with its lowest observable moments. A failure of the model to reproduce any additional 
kinematical data could only mean that / 7^ f{E, L^). 

The model-building technique has nevertheless been successfully applied to a few 
galaxies, notably M32 (van der Marel et al. 1994; Dehnen 1995; Qian et al. 1995). But this 
apparent success should be interpreted with caution. It is possible that these galaxies satisfy 
all of the assumptions made in the model building, i.e. axisymmetry, / = f{E,Lz), and 
M/L = constant. But it is also possible that the published models would fail to reproduce 
a full, two-dimensional set of velocities if such data were available. Future observations of 
these galaxies should reveal which of these two possibilities is correct. 

Of course, Binney, Davies & lUingworth (1990) were concerned with situations where 
the kinematical data are confined to just a few cuts across the image of the galaxy. Given 
such limited data, their approach seems entirely justified. At the same time, our discussion 
makes clear that - when two-dimensional data are available, as is increasingly often the 
case - one can and should infer the kinematics and potential directly from the velocities. 

Merrifield (1991) proposed a test for the sufficiency of two-integral models when 
describing edge-on, axisymmetric galaxies. His test is based on the assumption that mass 
follows light, and uses as a discriminant the measured velocity dispersions. Since one 
can always construct the potential so as to reproduce the velocity dispersion data in a 
two-integral system, as shown here, conclusions based on Merrifield's test are contingent on 
the assumption that mass follows light. 

3. PRACTICAL INVERSION ALGORITHMS 

The formal solutions given above to the axisymmetric inverse problem are of little 
use when dealing with real data, since the inversion equations are ill-conditioned in the 
sense understood by statisticians (Miller 1974; O'Sullivan 1986). This means that errors or 
incompleteness in the data will be amplified when going from data space to model space 
unless some objective smoothness condition is placed on the solution. The degree to which 
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errors are amplified depends on the number of effective differentiations tliat separate tlie 
model from tlie data. For instance, tlie distribution function f{E, L^) is essentially a second 
derivative of the number density (Eq. [TP|) which is itself a half-order derivative of the 
surface brightness (Eq. ^ - hence the determination of / is strongly ill-conditioned even 
though formally unique. Some authors have ignored this element of the problem and carried 
out direct deprojections without imposing smoothness constraints. The results tend to be 
spectacularly noisy (e.g. Figure 2 of Kuijken 1995). Another approach, equally ill-advised, 
is to replace the data by ad hoc fitting functions for which the inversions can be carried out 
exactly (e.g. Qian et al. 1995). Unless these smooth functions are generated from the data 
using a nonparametric algorithm, they will contain a bias that is likewise amplified by the 
inversion. Solutions obtained in this way may look appealing in a sense but are likely to 
be far from the true solutions. 

Richardson-Lucy iteration (e.g. Dehnen 1995) is genuinely nonparametric, going in 
discrete steps from an initial guess to the (ill-defined) solution that would have been 
obtained via direct inversion of the data. However the success of this technique depends 
on the accuracy of the initial guess. A bad guess will require many iterations before 
the projected solution approximates the data, at which point the solution is likely to be 
unacceptably noisy. One typically halts the iteration before this occurs but the solution so 
obtained will accordingly be biased in the direction of the initial guess by some unknown 
amount. Furthermore, solutions obtained in this way tend to achieve their optimal degree 
of smoothness in one part of solution space while remaining undersmoothed in other parts. 

Our goal is an algorithm which generates solutions that are characterized by acceptably 
small values of both the noise and the bias. Modern nonparametric methods (Miiller 1980; 
Hardle 1990; Wahba 1990; Scott 1992; Green & Silverman 1994) are designed with precisely 
this goal in mind. These methods make no assumptions about the global form of the 
solutions; they deal with the ill-conditioning by imposing smoothness via an effectively local 
constraint on the level of fluctuations in the solution, an approach called "regularization." 
They thus avoid the pitfalls of both Richardson-Lucy iteration (no regularization) and 
model fitting (parametric bias). 

Our focus is on situations where the velocities are measured discretely, as would be the 
case for stars in a globular cluster, planetary nebulae around a galaxy, galaxies in a cluster, 
etc. Modifying the algorithms to deal with spatially-continuous data is straightforward. 

We assume that the number density profile i/(g7, z) of the kinematical sample is known. 
Nonparametric algorithms for estimating u in the spherical geometry have been described 
by Merritt (1993a) in the case that the surface brightness is measured directly, and by 
Merritt & Tremblay (1994) in the case that the surface brightness must first be inferred 
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from discrete positions. These algorithms are easily generalized to the axisymmetric case. 
We also continue to assume, as discussed above, that the observer lies in the equatorial 
plane of the observed system, since otherwise the inverse problem is guaranteed not to have 
a unique solution (Gerhard & Binney 1996). 



3.1. A Variational Problem 

The inverse problems to be solved here belong to a class of problems that have been 
well studied by statisticians in the last few years (e.g. Wahba 1990). Let the data be 
di,d2,---,dn and the model u. The data are related to the model via 

di = CiU + Ci, (21) 

where the Ci are known linear functionals - in our case, projections - of u, and the 
represent measurement errors or scatter from some other source. For instance, in the 
determination of the rotational velocity field v^, we would have 

u = v,p{vj, z) (22) 

and 

roo dzU 

CiU = 2XiY^{Xi, Ziy-^ / t/(OT, Zi)u{'nj, Zi)—= 

= v^{Xi,Zi), (23) 

the observed rotation at point {Xi,Zi). If the line-of-sight rotational velocity were 
measured directly, then di — Vp{Xi, Z^) and the would be measurement errors. If instead 
discrete velocities were measured, then di = Vi, the observed velocity of the ith star, and 
= Vi — Vp{Xi, Zi), the deviation (due both to measurement errors and to the intrinsic 
velocity dispersion) of the ith star's measured velocity from the true mean line-of-sight 
velocity at point {Xi,Zi). 

We seek a smooth function u such that Cu is suitably close to the data. One possible 
approach is to construct a continuous approximation to the function represented by the data 
- e.g. Vp{X, Z) - using a kernel or spline smoother, and then to operate mathematically 
on this function via the inverse operator C^^ (if it exists) to produce the estimate it. This 
approach is consistent in the sense defined by statisticians as long as the functions that are 
fit to the data are asymptotically unbiased, i.e. nonparametric (Wahba 1990, p. 19), and 
such an approach has been used with success by Merritt & Gebhardt (1994) to solve the 
dynamical inverse problem in spherical geometry. Such an approach will not be followed 
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here because one of the inverse problems that we wish to solve - the recovery of f{E, L^) 
from u, and $ - does not have a simple inverse operator. In addition, one sometimes 
wishes to place physical constraints (e.g. positivity) on the solutions and this is extremely 
difficult to do if the function approximations are carried out in data space rather than 
solution space. 

Instead we define our solutions implicitly, as the functions u that minimize functionals 
of the form 

n 

n-^^(AM-rf,)^ + AJ(m). (24) 

i=l 

The first term in equation ( P^ measures the fidelity of the model to the data. Minimizing 
this term alone would lead to an unacceptably noisy solution due to the ill-conditioning 
of the inverse operator. The second term measures the degree to which the solution is 
unsmooth. One standard choice for the "penalty function" J{u) is 

^(«) = i™W= J^^//(g;^) ™>2. (25) 

(Here and below, a two-dimensional u = u{x, y) is assumed.) The function u that minimizes 
equation (p^) with penalty function ( pS]) is the so-called "thin plate spline" solution 
(Duchon 1977; Meinguet 1979). Remarkably, this solution can be found in essentially 
analytic form for many simple operators C (Wahba & Wendelberger 1980). 

The qualitative nature of the solution to equations ( pi]) and (pSD depends on the choices 
of A and m. The penalty function acts as a low-pass filter, with A controlling the half-power 
point of the filter and m the steepness of the roll-off (Craven & Wahba 1979). Typically one 
chooses m = 2; the smoothness of the solution is then controlled by varying A. Too large 
a value for A will oversmooth the solution, i.e. reduce its curvature; too small a value will 
yield an overly noisy solution. The larger the data set and the smaller the measurement 
errors, the smaller the value of A that can be profitably used and the more accurate the 
solution. Thus the bias - the average deviation of the solution from the true function - falls 
to zero as the sample size n is increased. In parametric algorithms, by contrast, the bias 
remains finite even as n tends to infinity, since the adopted parametric form is guaranteed 
to be incorrect at some level and this incorrectness does not diminish as n is increased. It 
is for this reason that parametric methods are unsuited to data-based inverse problems. 

Data are typically incomplete or cover only a limited region. The use of a penalty 
function guarantees that the solution will tend toward a predictable form in parts of 
solution space that are not strongly constrained by the data. For instance, the thin-plate 
spline solution with m = 2 yields a solution that is linear in x and y wherever the data do 
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not force a different functional form, since any linear function is assigned zero penalty by 



expression ([25|). This very useful feature of solutions found via penalty functions means 
that there is never any need to extend or augment the data in ad hoc ways. Nonparametric 
techniques based on direct smoothing of the data, e.g. via kernels, do not share this nice 
property. 

Many variations on this basic scheme exist. If the data errors are not normally 
distributed, one can choose the first term in equation (plf ) to be some more robust measure 
of the fidelity of the model to the data. Such a modification would be appropriate in the 
study of galaxy cluster dynamics where interlopers are common. Another variation is to 
modify the form of the penalty function. The simplest such variation is to add a factor 
^m-Q Tf^[^\^ some positive constant; changing k is equivalent to changing the units in which 
y is measured. One can also include in the penalty function a weighting function w{x,y) 
that is large where the solution u is expected to be small, thus encouraging the fluctuations 
in the solution to remain a fixed fraction of u in amplitude everywhere. Finally, one can 
replace the differential operator gfe^yfi by some more complicated operator, which defines 
as "perfectly smooth" some different class of functions (e.g. Silverman 1982). 

For many choices of J{u), including the thin plate penalty function defined above, 
the quantity to be minimized is quadratic in u. This means that a standard quadratic 
programming routine (Rustagi 1994) can be used to find the solution on some grid in model 
space. The use of a quadratic programming algorithm also allows the easy imposition of 
linear constraints, such as positivity, on the solution if desired. 

There is a huge literature concerning the choice of the smoothing parameter A. While 
"objective" schemes exist, a standard practice is to slowly decrease A until the solution 
begins to exhibit small-scale fluctuations, then stop. 



3.2. The Rotational Velocity Field 

Consider first the recovery of the rotational velocity field in the meridional plane 
V(l,{zu, z). We seek the function that minimizes 



= Y1 {vi - CiV^f + A 



1=1 



with Vj the measured velocities and 



dw'^ I \ dwdz 



POO 

J Xi 



+ 



dw 



^2 - X} 



dwdz, (26) 



(27) 



- 15 - 



We choose as our function to be optimized rather than the physically more natural 
choice uv^, since is expected to be a more nearly linear function of the coordinates and 
hence better suited to a second-derivative penalty function. 

We begin by defining a Cartesian grid in model space, zuj = zui + (j — 1)6 w, I < j < N, 
and Zk = Zi + {k — l)6z, 1 < k < N. The variables to be solved for are the N'^ values vj^k of 
the rotational velocity on the grid. The derivatives and integrals on the right hand side of 
(p6|) can easily be represented as linear combinations of the vj^k using standard expressions 
from numerical analysis; for details from a very similar problem, see Merritt & Tremblay 
(1994). Placing the Vj^k into a linear vector g, one can then write 

x' = g^Ag-Bg + C (28) 

where the quantities A, B and C are straightforward to compute but messy and will not 
be given here. The resulting expression is quadratic in the unknowns, and the constraints 
Vj^k > are linear. The values Vj^k that minimize can accordingly be found via quadratic 
programming. 

Figure 1 shows an estimate of v^{w, z) derived using 5000 discrete velocities drawn 
randomly from an oblate model belonging to Lynden-Bell's (1962) family. The adopted 
model is the flattest one consistent with non-negative /, having h/a = 0.638 in Lynden- 
Bell's notation and a = —0.814 in the notation of Dejonghe (1986). (These models have 
non-constant ellipticity, becoming spherical at large radii for all values of b/a.) The odd 
part of /, which specifies the degree of ordered motion, was set to the function derived in 
the appendix of Lynden-Bell's paper when generating the Monte-Carlo data. The resultant 
rotational velocity field has = a^f, = everywhere, an "oblate isotropic rotator." Note 
that even this extreme model from Lynden-Bell's sequence is only mildly flattened, except 
very near the center, and the rotation is correspondingly small; the peak rotation velocity is 
only about one half of the central, one- dimensional velocity dispersion. In spite of the small 
signal, however, the inversion algorithm recovers the rotational velocity field reasonably well 
near the center; the main defect is a too-shallow rise of the rotation velocity near w = 0, 
an inevitable result of the smoothing. This example suggests that data sets consisting of a 
few thousand velocities are none too large for recovering the rotational velocity field of an 
axisymmetric system in a model-independent way. 
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3.3. The Velocity Dispersions and Potential 

We now wish to make estimates of aiw, z) and aAw, z). We search for the functions 



o"^ and v'i that minimize 



X 



+ A 



+ A 



i=l 



dwdz 



+ 



2„,2 





dwdz 



dz^ 



dwdz 



dwdz 



(29) 



subject to the constraints of equation (| 



V d 



dz dw ~^ w dz ^ 







dw dz 

as well as the usual positivity constraints. The operator Ci is now defined as 



(30) 



Ci{a^vl} = 2E{X„Z,)-' f 



oo 

^ ' Z/(Ct7, Zi 
Xi 



aH^,Z,) + ^vl(w,ZA 



w^ 



w^ 



dw 



Iw^ - Xf 
(31) 

If the velocities contain measurement errors, these should be subtracted in quadrature from 
the Vi. 

We define a double grid in {w,z}, with the first A^^ points containing the cr^ values 
and the second A^^ points the v'^ values. The constraints (^) are approximated via 
finite-difference derivatives, which have the effect of linking together the values on the two 



grids. Finally, one obtains 
previous step. 



via cr: 



with f ? having been recovered in the 



Before applying the algorithm to pseudo-data, one would like some assurance that the 
information contained in the observed mean-square velocity field Vp{X,Z) is sufficient, in 
principle, to uniquely constrain both a{w, z) and ^^(ro, z). Ideally this hypothesis would be 
demonstrated by finding analytical expressions for the inverse relations. Such expressions 
could not be found (which may simply reflect limitations in the author's mathematical 
abilities), and instead the case will here be made numerically. 

Figure 2 shows how the integrated square error in the solutions a'^iw, z), d''i{w, z), 



ISE 



cr^ — 6"^ ) dwdz + 



(j^ — a^] dwdz 



(32) 
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depends on the degree to which the constraints ( pOD are imposed. That is, these constraints 
were replaced by 

- A < — — - — — + -— (a' - vl) < A, 33 

with the parameter A > - a constant at all points on the solution grid - determining how 
stringently the contraints are imposed. The algorithm was given as input the exact values, 
on a grid, of the mean square velocity Z) of the rotating oblate model of Figure 1. 

Solutions were then obtained as a function of A. Figure 2 demonstrates that the error in 
the solutions tends to zero along with A, which suggests - at least for this particular model 
- that the constraints (^) contain just enough information to yield the correct inversion. 
This experiment was repeated for a number of other models from the Lynden-Bell sequence 
of models. For each model examined, the solution with A = was found to be extremely 
close to the true solution. These experiments hardly constitute a rigorous proof that the 
inversion is uniquely defined for every possible two-integral oblate model, however. 

Figures 3a and 3b illustrate the recovery of cr(ro, and a^{w^z) given the same 5000 
velocities drawn from the rotating oblate model of Figure 1. Given the uniqueness, just 
demonstrated, of the inversion for this model, the discrepancies apparent in the numerical 
solutions of Figure 3 are primarily due to the finiteness of the data set rather than to any 
limitations of the algorithm. Different, 5000- velocity Monte-Carlo realizations of the same 
model were found to yield solutions that were different in their details but that had roughly 
the same average error. 

The gravitational potential is obtained by applying either of equations (10) or (11), in 
finite-difference form, to the derived a and o^. The result is shown in Figures 3c and 3d. 
The derived potential is remarkably similar to the correct one except for slightly reduced 
gradients, again an unavoidable result of the smoothing. 



3.4. The Distribution Function 

We now want to find the functions f+{E, Lz) and f-{E, L^) that generate the estimated 
number density u{w, z) and rotational velocity field V(f,{w, z) in the reconstructed potential 
^{vj,z). This is a simpler version of a problem solved in Merritt (1993b), and we follow a 
similar approach here. A rectangular grid is defined in {E^ L2)-space, and the unknowns 
f+j^k ^^"i f-j,k assumed constant over each cell (Figure 4). The configuration-space 
density at any point {w,z) is the sum of the contributions Vceii = 4:Hzu^^AEALzf+ji^, for 
each cell such that L^j^k < [E — ^{zu, z)]. (Cells that lie partly below and partly above 
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this curve contribute by an amount proportional to the area that hes below the curve.) 
Similarly, each cell contributes ^v^^^^ — 47rro~^A£'AL^/_^- ^ to the estimate of v{vj^ z)v^. 
The quantity to be minimized is 



X 



cell 



da 



dEdLl^4:) 



in the case of /+, and 

n / \ 2 

i=l \ cell / 



dEdL, 



+ X2{E) 



dLl 



dEdLl^B) 



in the case of /_. The integrals are understood to extend over the region in the {E, L^) 
plane such that < < Lmax{E), where Lmax{E) is the maximum value of attainable 
at energy E (Fig. 4). The smoothing parameters Ai and A2 are allowed to depend on the 
energy since / falls off much more rapidly than linearly with E^ and a penalty function 
with constant A would not strongly penalize a solution that was smooth at large \E\ but 
noisy for E Furthermore, a different degree of smoothing is permitted in the E and 
directions since / is expected to be much more homogeneous in than in E. 

Figure 5 illustrates the recovery of /+ and /_ using the approximations to ^(vj,z) and 
obtained in the previous sections. The smoothing parameters were chosen to vary as 
lE'l^^'^, with A2/A1 = 100. Once again, the only obvious systematic error takes the form of 
a bias due to the smoothing. 

The derivation of f{E,Lz) given //(cc, 2;) and $(cc7,z) is a classical problem in stellar 
dynamics. Lynden-Bell (1962), Hunter (1975) and Dejonghe (1986) demonstrated the 
uniqueness of the solution, and recently Hunter & Qian (1993) have presented an elegant 
algorithm for carrying out the inversion given analytic expressions for u and 

Hunter & Qian's algorithm is not regularized and hence not suitable for use with 
real data. It has nevertheless been applied in this way, after first replacing the data with 
analytic models (Qian et al. 1995). This approach is dangerous at best since any slight error 
in the choice of fitting functions will be amplified enormously by the inversion operator. 
Solutions so obtained for / may be smooth and physically appealing, but they are unlikely 
to be close to the true solutions. 
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From a numerical point of view, the only non-trivial aspect of solving the Lynden- 
Bell-Hunter problem is the regularization; the inversion itself is essentially just a matrix 
operation once the integral operator has been represented via some discrete approximation. 
Much effort has been directed recently toward refining the discretization (Kuijken 1995) or 
the inversion (Hunter & Qian 1993), with curiously little attention paid to the more crucial 
question of regularization. 



4. SUMMARY 

Given measurements of the line-of-sight rotational velocity and the velocity dispersion 
over the two-dimensional image of an edge-on, axisymmetric galaxy, one can derive the 
unique functions ^{zu,z) and f{E,Lz) that reproduce those data, without the necessity of 
assuming that mass follows light. The validity of such a model can then be tested using 
other data, such as the full line-of-sight velocity distributions or the proper motions. Failure 
to reproduce the data using this method would imply that / depends on a third integral of 
motion (or that one of the geometric assumptions, axisymmetry or zero inclination, was 
violated). This approach is a generalization of the one pioneered by Binney, Davies and 
Illingworth (1990) which assumes that mass follows light and which derives both $ and / 
from the luminosity distribution alone, without using the kinematical data except in the 
normalization of $. We have presented numerical algorithms, suitable for use with noisy 
and incomplete data, for carrying out the required inversions and shown that they provide 
smooth and accurate estimates of / and $ using simulated data sets derived from oblate 
models. 

In a companion paper, the algorithms described here will be applied to velocity data 
from the globular cluster uj Centauri. 

The results described here suggest the following avenues for future work. 

1. A mathematical - as opposed to numerical - demonstration of uniqueness (or 
non-uniqueness) of the solutions to the coupled equations (|T2|) and (|1^) would be of 
fundamental importance. 

2. The second-derivative penalty functions adopted here are almost certainly 
non-optimal and it would be worthwhile to investigate more general forms. In addition, it 
is possible that the solutions to the roughness-penalized inverse problems defined here can 
be found analytically, as in Wahba & Wendelberger (1980). 

3. As shown most recently by Gerhard & Binney (1996), the axisymmetric inverse 
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problem becomes under-determined when the stellar system is viewed from a direction that 
is not parallel to the symmetry plane. Additional information must therefore be added if 
one is to solve the inverse problem for an arbitrary orientation. (One would also like to 
infer that orientation from the data.) There is at present no clear understanding of what 
form that extra information should take. 

4. One can define an alternate inverse problem based on the Ansatz that mass follows 
light, or more generally that the form of the potential ^{to, z) is known a priori; for instance, 
the potential might be derived from X-ray data, from the statistics of gravitational Icnsing, 
etc. This is of course the usual assumption that is made when modelling axisymmetric 
galaxies; however the goal would be to learn something unique about the degree of velocity 
anisotropy cr^/cr^ from the velocity moment data, as in the analogous spherical problem 
(Binney & Mamon 1982). Unfortunately one can show that the assumption of a known 
potential is not sufficient in itself to uniquely constrain the anisotropy, even for an edge-on 
system; one must make additional assumptions, e.g. that the velocity ellipsoid is everywhere 
aligned with the coordinate axes. Nevertheless, more work along these lines would help to 
elucidate the more difficult inverse problem for a three-integral /. 

This work was supported by NSF grant AST 90-16515 and NASA grant NAG 5-2803. 
T. Fridman kindly assisted in the plotting of the figures. I thank W. Dehnen and O. 
Gerhard for lively discussions on the topic of axisymmetric modelling, and for pointing out 
some mathematical errors in an early version of the paper. I also thank H. Dejonghe for 
many useful insights on problems of this sort. 
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Fig. 1.- 

Recovery of the rotational velocity field v^{'OJ, z) given 5000 velocities derived from an 
oblate, Lynden-Bell (1962) model with maximum flattening. Heavy contours: estimated 17^; 
thin contours: exact v^. 

Fig. 2.- 

Dependence of the ISE of the velocity dispersion solutions, Eq. (|32D, on the parameter 
A that determines how stringently the constraints (^) are applied. 

Fig. 3.- 

Recovery of the velocity dispersions and potential from the same data used Fig. 1. 
Heavy contours: estimated values; thin contours: exact values, (a) cr; (b) (c) $; (d) 
$(ro,0). 

Fig. 4.— 

Grid-based scheme for recovery of f+{E,Lz). (a) Solution grid. Upper curve is the 
maximum as a function of E; lower curve is the characteristic curve for a given {w,z). 
(b) Projection of a single {E, Lz) grid cell into (cu, 2;)-space. 

Fig. 5.— 

Recovery of f+{E,Lz) (a) and f_{E,Lz) (b) given data generated from the oblate 
Lynden-Bell model. Heavy contours: estimated values; thin contours: exact values. 
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